###Simulating Trajectories

tmax <- 10
X1_0 <- 600
X2_0 <- 30
X3_0 <-10^5
rho <- 0.108
delta <- 0.5
eta <- 9.5*10^(-6)
lambda <- 36
N1 <- 1000
C <- 3
sigma1 <- 0.1
sigma2 <- 0.1
sigma2 <- 0.1
make_path <- function(tmax, x1_0, x2_0, x3_0, rho, delta, eta, lambda, N1, C, sigma1, sigma2, sigma3, deltat, n){
  #Initializing
  nosteps <- tmax/deltat
  x1 <- numeric(nosteps + 1)
  x2 <- numeric(nosteps + 1)
  x3 <- numeric(nosteps + 1)
  x <- matrix(data = c(x1, x2, x3), ncol = 3)
  x[1,] <- c(x1_0, x2_0, x3_0)
  sigma <- c(sigma1, sigma2, sigma3)
  
  #Simulationg the driving BM
  dW1 <- rnorm(n = nosteps, mean = 0, sd = sqrt(deltat))
  dW2 <- rnorm(n = nosteps, mean = 0, sd = sqrt(deltat))
  dW3 <- rnorm(n = nosteps, mean = 0, sd = sqrt(deltat))
  dW <- matrix(data = c(dW1, dW2, dW3), ncol = 3)
  
  #Defining drift function
  drift <- function(x){
    x1 <- x[1]
    x2 <- x[2]
    x3 <- x[3]
    y1 <- lambda - rho*x1 - eta*x1*x3
    y2 <- eta*x1*x3 - delta*x2
    y3 <- N1*delta*x2 - C*x3
    y <- c(y1, y2, y3)
    y
  }
  
  #Iterating
  for (k in (1:(nosteps))){
    x[k+1,] <- x[k,] + drift(x[k,])*deltat + sigma*x[k,]*dW[k,] + 0.5*sigma*x[k,]*sigma*(dW[k,]*dW[k,] - deltat)
  }
  
  #Creating output and adding time variable
  t <- (0:nosteps)*deltat
  out <- data.frame(x, t)
  colnames(out) <- c("x1", "x2", "x3", 't')
  out
}
subsample <- function(x){
  x <- as.matrix(x)
  
  x_fine <- matrix(data = 0, nrow = 500, ncol = 4)
  for (i in (1:500)){
    x_fine[i,] <- x[20*(i-1) + 1,]
  }
  x_fine <- as.data.frame(x_fine)
  colnames(x_fine) <- c('x1','x2','x3','t')
  
  x_coarse <- matrix(data = 0, nrow = 100, ncol = 4)
  for (i in (1:100)){
    x_coarse[i,] <- x[100*(i-1) + 1,]
  }
  x_coarse <- as.data.frame(x_coarse)
  colnames(x_coarse) <- c('x1','x2','x3','t')
  
  out <- list(x_fine, x_coarse)
  out
}
path <- make_path(tmax = 10, x1_0 = 600, x2_0 = 30, x3_0 = 10^5, rho = 0.108, delta = 0.5, eta = 9.5*10^(-6), lambda = 36, N1 = 1000, C = 3, sigma1 = 0.1, sigma2 = 0.1, sigma3 = 0.1, deltat = 0.001)
sampled_path <- subsample(path)
x_fine <- sampled_path[[1]]
x_coarse <- sampled_path[[2]]

###Plotting the trajectory

path_plot_fine <- plot_ly(x_fine, x = ~x1, y = ~x2, z = ~x3,
                          type = 'scatter3d', mode = 'lines+markers',
                          line = list(width = 6, color = ~t, colorscale = 'Viridis'),
                          marker = list(size = 3.5, color = ~t, colorscale = 'Greens', cmin = -20, cmax = 50))
path_plot_fine
path_plot_coarse <- plot_ly(x_coarse, x = ~x1, y = ~x2, z = ~x3,
                          type = 'scatter3d', mode = 'lines+markers',
                          line = list(width = 6, color = ~t, colorscale = 'Viridis'),
                          marker = list(size = 3.5, color = ~t, colorscale = 'Greens', cmin = -20, cmax = 50))
path_plot_coarse

Euler-Maruyama Estimator

For making inference, we need to have a likelihood for a ny given parameter \(\theta\). This will be the sum of the likelihoods of each step, under the EM-hypothesis. Thus, for each increment, we need to assign a likelihood of that increment, given the parameter vector. In this case, the stochastic increments to each coordinate are independent, due to the diagonal structure of \(\Sigma\). So the stochastic quantities here are Normal and Chi-square distributed. But we have a sum here, but one is a deterministic transformation of the other. But there might be more that one dW that give the same increment. In that case i suppose we have to assign both the likelihoods. Do we need to take a convolution to find the density of the sum? Apparently, we are to use a Gaussian approximation as given on the slides. The books probably cover this as well. Upon looking at the slides again, the EM-based estimator seems quite doable. Implementing likelihood evaluation for single step

likelihood_step <- function(theta, x, x_next, deltat){
  #Defining tunable and constant parameters
  lambda <- theta[1]
  N1 <- theta[2]
  C <- theta[3]
  sigma1 <- theta[4]
  sigma2 <- theta[5]
  sigma3 <- theta[6]
  
  #Defining drift function
  drift <- function(x){
    x1 <- x[[1]]
    x2 <- x[[2]]
    x3 <- x[[3]]
    y <- numeric(length = 3)
    y[1] <- lambda - rho*x1 - eta*x1*x3
    y[2] <- eta*x1*x3 - delta*x2
    y[3] <- N1*delta*x2 - C*x3
    y
  }
  # need to assign subparameters of theta to the names used in the code
  z <- numeric(3)
  z[1] <- x[[1]]
  z[2] <- x[[2]]
  z[3] <- x[[3]]
  #z_next <- numeric(3)
  #z_next[1] <- x_next
  #browser()
  y <- drift(x)
  a <- log(((2*pi*deltat)^3) * (z[1]*z[2]*z[3]*sigma1*sigma2*sigma3)^2)
  b <- ((x_next - x + y*deltat)^2)*c((z[1]*sigma1)^(-2), (z[2]*sigma2)^(-2), (z[3]*sigma3)^(-2))/deltat
  c <- b[[1]]*b[[2]]*b[[3]]
  a+c
  }
#Test chunk
theta <- c(lambda, N1, C, sigma1, sigma2, sigma3)
likelihood_step(theta = theta, x = x_coarse[1,], x_next = x_coarse[2,], deltat = 0.1)
[1] 119262182
#We get a number, so that's good.

Implementing likelihood for entire process

likelihood_total <- function(theta, process, deltat){
  nosteps <- length(process[,1])-1
  cumulative_likelihood <- 0
  for (i in (1:nosteps)){
    cumulative_likelihood <- cumulative_likelihood + likelihood_step(theta = theta, x = process[i,], x_next = process[i+1,], deltat = deltat)
  }
  cumulative_likelihood
}

Testing total likelihood function

likelihood_total(theta = theta, process = x_coarse, deltat = 0.1)
[1] 125389127
#We get a number, which is promising
likelihood_total(theta = theta_0, process = x_coarse, deltat = 0.1)
[1] 5108.032

We see that evaluating the total likelihood function returns the same number for two different set of parameters. This indicates that that the total likelihood fucntion doesn’t in fact use the parameter provided for evaluation. It seems that likelihood_step takes its parameters from the global envieronment rather than its arguement. Ths would explain why the likelihood is evaluated to the same for any parameter. Now we got different numbers.

Now, we try to optimize this for some parameter vector theta.

theta_0 <- c(50, 2000, 10, 0.5, 0.5, 0.5)
optim(par = theta_0, fn = likelihood_total, process = x_coarse, deltat = 0.1, lower = c(0,0,0,0,0,0), method = "L-BFGS-B")
Error in optim(par = theta_0, fn = likelihood_total, process = x_coarse,  : 
  L-BFGS-B needs finite values of 'fn'

It would appear that i have made a mistake: Optim outputs the initial parameters, and states that convergence was not reached. Another mistake is not finishing a coding project before taking a 2-week break: It’s a little difficult to jumo in again.

It takes some time to run now, and we don’t get convergence or the right results. The total likelihood function was called over 400 times, probably because of lack of convergence, which might also be why it takes so long. Question: are we maximizing or minimizing? What is the parameter space? What exactly is the quantity being computed in the step likelihood function? I think it is the likelihood, formula from the slides. Let’s try with some parameters that are closer to the real ones. We get completely atrocious estimates. We try bounding barameter values from below with zero and using L-BFGS-B method. It seems that we get infinite values from total likelihood, which is unfortunate. Maybe the log is fucking me?

---
title: "ISDE Exercises Week 4"
output: html_notebook
---

###Simulating Trajectories

```{r}
tmax <- 10
X1_0 <- 600
X2_0 <- 30
X3_0 <-10^5
rho <- 0.108
delta <- 0.5
eta <- 9.5*10^(-6)
lambda <- 36
N1 <- 1000
C <- 3
sigma1 <- 0.1
sigma2 <- 0.1
sigma3 <- 0.1
```

```{r}
make_path <- function(tmax, x1_0, x2_0, x3_0, rho, delta, eta, lambda, N1, C, sigma1, sigma2, sigma3, deltat, n){
  #Initializing
  nosteps <- tmax/deltat
  x1 <- numeric(nosteps + 1)
  x2 <- numeric(nosteps + 1)
  x3 <- numeric(nosteps + 1)
  x <- matrix(data = c(x1, x2, x3), ncol = 3)
  x[1,] <- c(x1_0, x2_0, x3_0)
  sigma <- c(sigma1, sigma2, sigma3)
  
  #Simulationg the driving BM
  dW1 <- rnorm(n = nosteps, mean = 0, sd = sqrt(deltat))
  dW2 <- rnorm(n = nosteps, mean = 0, sd = sqrt(deltat))
  dW3 <- rnorm(n = nosteps, mean = 0, sd = sqrt(deltat))
  dW <- matrix(data = c(dW1, dW2, dW3), ncol = 3)
  
  #Defining drift function
  drift <- function(x){
    x1 <- x[1]
    x2 <- x[2]
    x3 <- x[3]
    y1 <- lambda - rho*x1 - eta*x1*x3
    y2 <- eta*x1*x3 - delta*x2
    y3 <- N1*delta*x2 - C*x3
    y <- c(y1, y2, y3)
    y
  }
  
  #Iterating
  for (k in (1:(nosteps))){
    x[k+1,] <- x[k,] + drift(x[k,])*deltat + sigma*x[k,]*dW[k,] + 0.5*sigma*x[k,]*sigma*(dW[k,]*dW[k,] - deltat)
  }
  
  #Creating output and adding time variable
  t <- (0:nosteps)*deltat
  out <- data.frame(x, t)
  colnames(out) <- c("x1", "x2", "x3", 't')
  out
}
```

```{r}
subsample <- function(x){
  x <- as.matrix(x)
  
  x_fine <- matrix(data = 0, nrow = 500, ncol = 4)
  for (i in (1:500)){
    x_fine[i,] <- x[20*(i-1) + 1,]
  }
  x_fine <- as.data.frame(x_fine)
  colnames(x_fine) <- c('x1','x2','x3','t')
  
  x_coarse <- matrix(data = 0, nrow = 100, ncol = 4)
  for (i in (1:100)){
    x_coarse[i,] <- x[100*(i-1) + 1,]
  }
  x_coarse <- as.data.frame(x_coarse)
  colnames(x_coarse) <- c('x1','x2','x3','t')
  
  out <- list(x_fine, x_coarse)
  out
}
```
```{r}
path <- make_path(tmax = 10, x1_0 = 600, x2_0 = 30, x3_0 = 10^5, rho = 0.108, delta = 0.5, eta = 9.5*10^(-6), lambda = 36, N1 = 1000, C = 3, sigma1 = 0.1, sigma2 = 0.1, sigma3 = 0.1, deltat = 0.001)
sampled_path <- subsample(path)
x_fine <- sampled_path[[1]]
x_coarse <- sampled_path[[2]]
```

###Plotting the trajectory
```{r include=FALSE}
library(plotly)
```
```{r}
path_plot_fine <- plot_ly(x_fine, x = ~x1, y = ~x2, z = ~x3,
                          type = 'scatter3d', mode = 'lines+markers',
                          line = list(width = 6, color = ~t, colorscale = 'Viridis'),
                          marker = list(size = 3.5, color = ~t, colorscale = 'Greens', cmin = -20, cmax = 50))
path_plot_fine
```
```{r}
path_plot_coarse <- plot_ly(x_coarse, x = ~x1, y = ~x2, z = ~x3,
                          type = 'scatter3d', mode = 'lines+markers',
                          line = list(width = 6, color = ~t, colorscale = 'Viridis'),
                          marker = list(size = 3.5, color = ~t, colorscale = 'Greens', cmin = -20, cmax = 50))
path_plot_coarse
```

### Euler-Maruyama Estimator
For making inference, we need to have a likelihood for a ny given parameter $\theta$. This will be the sum of the likelihoods of each step, under the EM-hypothesis. Thus, for each increment, we need to assign a likelihood of that increment, given the parameter vector. In this case, the stochastic increments to each coordinate are independent, due to the diagonal structure of $\Sigma$. So the stochastic quantities here are Normal and Chi-square distributed. But we have a sum here, but one is a deterministic transformation of the other. But there might be more that one dW that give the same increment. In that case i suppose we have to assign both the likelihoods. Do we need to take a convolution to find the density of the sum? Apparently, we are to use a Gaussian approximation as given on the slides. The books probably cover this as well.
Upon looking at the slides again, the EM-based estimator seems quite doable.
Implementing likelihood evaluation for single step
```{r}
likelihood_step <- function(theta, x, x_next, deltat){
  #Defining tunable and constant parameters
  lambda <- theta[1]
  N1 <- theta[2]
  C <- theta[3]
  sigma1 <- theta[4]
  sigma2 <- theta[5]
  sigma3 <- theta[6]
  
  #Defining drift function
  drift <- function(x){
    x1 <- x[[1]]
    x2 <- x[[2]]
    x3 <- x[[3]]
    y <- numeric(length = 3)
    y[1] <- lambda - rho*x1 - eta*x1*x3
    y[2] <- eta*x1*x3 - delta*x2
    y[3] <- N1*delta*x2 - C*x3
    y
  }
  # need to assign subparameters of theta to the names used in the code
  z <- numeric(3)
  z[1] <- x[[1]]
  z[2] <- x[[2]]
  z[3] <- x[[3]]
  #z_next <- numeric(3)
  #z_next[1] <- x_next
  #browser()
  y <- drift(x)
  a <- log(((2*pi*deltat)^3) * (z[1]*z[2]*z[3]*sigma1*sigma2*sigma3)^2)
  b <- ((x_next - x + y*deltat)^2)*c((z[1]*sigma1)^(-2), (z[2]*sigma2)^(-2), (z[3]*sigma3)^(-2))/deltat
  c <- b[[1]]*b[[2]]*b[[3]]
  a+c
  }
```

```{r}
#Test chunk
theta <- c(lambda, N1, C, sigma1, sigma2, sigma3)
likelihood_step(theta = theta, x = x_coarse[1,], x_next = x_coarse[2,], deltat = 0.1)
#We get a number, so that's good.
```
Implementing likelihood for entire process
```{r}
likelihood_total <- function(theta, process, deltat){
  nosteps <- length(process[,1])-1
  cumulative_likelihood <- 0
  for (i in (1:nosteps)){
    cumulative_likelihood <- cumulative_likelihood + likelihood_step(theta = theta, x = process[i,], x_next = process[i+1,], deltat = deltat)
  }
  cumulative_likelihood
}
```
Testing total likelihood function
```{r}
likelihood_total(theta = theta, process = x_coarse, deltat = 0.1)
#We get a number, which is promising
likelihood_total(theta = theta_0, process = x_coarse, deltat = 0.1)
```
We see that evaluating the total likelihood function returns the same number for two different set of parameters.
This indicates that that the total likelihood fucntion doesn't in fact use the parameter provided for evaluation.
It seems that likelihood_step takes its parameters from the global envieronment rather than its arguement.
Ths would explain why the likelihood is evaluated to the same for any parameter.
Now we got different numbers.

Now, we try to optimize this for some parameter vector theta.
```{r}
theta_0 <- c(50, 2000, 10, 0.5, 0.5, 0.5)
optim(par = theta_0, fn = likelihood_total, process = x_coarse, deltat = 0.1, lower = c(0,0,0,0,0,0), method = "L-BFGS-B")
```
It would appear that i have made a mistake: Optim outputs the initial parameters, and states that convergence was not reached.
Another mistake is not finishing a coding project before taking a 2-week break: It's a little difficult to jumo in again.

It takes some time to run now, and we don't get convergence or the right results.
The total likelihood function was called over 400 times, probably because of lack of convergence, which might also be why it takes so long.
Question: are we maximizing or minimizing?
What is the parameter space?
What exactly is the quantity being computed in the step likelihood function? I think it is the likelihood, formula from the slides.
Let's try with some parameters that are closer to the real ones.
We get completely atrocious estimates.
We try bounding barameter values from below with zero and using L-BFGS-B method.
It seems that we get infinite values from total likelihood, which is unfortunate. Maybe the log is fucking me?



